onnx model
Investigating White-Box Attacks for On-Device Models
Zhou, Mingyi, Gao, Xiang, Wu, Jing, Liu, Kui, Sun, Hailong, Li, Li
Numerous mobile apps have leveraged deep learning capabilities. However, on-device models are vulnerable to attacks as they can be easily extracted from their corresponding mobile apps. Existing on-device attacking approaches only generate black-box attacks, which are far less effective and efficient than white-box strategies. This is because mobile deep learning frameworks like TFLite do not support gradient computing, which is necessary for white-box attacking algorithms. Thus, we argue that existing findings may underestimate the harmfulness of on-device attacks. To this end, we conduct a study to answer this research question: Can on-device models be directly attacked via white-box strategies? We first systematically analyze the difficulties of transforming the on-device model to its debuggable version, and propose a Reverse Engineering framework for On-device Models (REOM), which automatically reverses the compiled on-device TFLite model to the debuggable model. Specifically, REOM first transforms compiled on-device models into Open Neural Network Exchange format, then removes the non-debuggable parts, and converts them to the debuggable DL models format that allows attackers to exploit in a white-box setting. Our experimental results show that our approach is effective in achieving automated transformation among 244 TFLite models. Compared with previous attacks using surrogate models, REOM enables attackers to achieve higher attack success rates with a hundred times smaller attack perturbations. In addition, because the ONNX platform has plenty of tools for model format exchanging, the proposed method based on the ONNX platform can be adapted to other model formats. Our findings emphasize the need for developers to carefully consider their model deployment strategies, and use white-box methods to evaluate the vulnerability of on-device models.
Stitching non max suppression (NMS) to YOLOv8n on exported ONNX model
Following my previous post on exploring YOLOv8, I have been stuck at using the YOLOv8 model other than PyTorch, because the direct export model give result of dimension like [batch size, 5, 8400], which does encapsulated the result of overlapped bounding boxes and confidence score. TF Lite with object detection API) would require post process of this result into bounding boxes that's not overlapped and corresponding confidence scores. As I observed, the YOLO class is initialized with member "model", which is the core model that would output that [batch size, 5, 8400] liked result. And the forward function is calling a predict function (that's depends on what task, be it object detection, classification or segmentation) Given the code's export towards Tensorflow related model (TF Lite, Tensorflow.js…) are through ONNX - Tensorflow saved model (through onnx2tf package) and then to the target. So ONNX seems to be a good target to add the necessary nms operation, a second reason would be the ONNX nms operation could be more optimized (compare to torchvision's nms converted operation).
HE-MAN -- Homomorphically Encrypted MAchine learning with oNnx models
Nocker, Martin, Drexel, David, Rader, Michael, Montuoro, Alessio, Schöttle, Pascal
Machine learning (ML) algorithms are increasingly important for the success of products and services, especially considering the growing amount and availability of data. This also holds for areas handling sensitive data, e.g. applications processing medical data or facial images. However, people are reluctant to pass their personal sensitive data to a ML service provider. At the same time, service providers have a strong interest in protecting their intellectual property and therefore refrain from publicly sharing their ML model. Fully homomorphic encryption (FHE) is a promising technique to enable individuals using ML services without giving up privacy and protecting the ML model of service providers at the same time. Despite steady improvements, FHE is still hardly integrated in today's ML applications. We introduce HE-MAN, an open-source two-party machine learning toolset for privacy preserving inference with ONNX models and homomorphically encrypted data. Both the model and the input data do not have to be disclosed. HE-MAN abstracts cryptographic details away from the users, thus expertise in FHE is not required for either party. HE-MAN 's security relies on its underlying FHE schemes. For now, we integrate two different homomorphic encryption schemes, namely Concrete and TenSEAL. Compared to prior work, HE-MAN supports a broad range of ML models in ONNX format out of the box without sacrificing accuracy. We evaluate the performance of our implementation on different network architectures classifying handwritten digits and performing face recognition and report accuracy and latency of the homomorphically encrypted inference. Cryptographic parameters are automatically derived by the tools. We show that the accuracy of HE-MAN is on par with models using plaintext input while inference latency is several orders of magnitude higher compared to the plaintext case.
Accelerate and Productionize ML Model Inferencing Using Open-Source Tools
You've finally got that perfect trained model for your data set. To run and deploy it to production, there's a host of issues that lie ahead. Performance latency, environments, framework compatibility, security, deployment targets…there are lots to consider! In this tutorial, we'll look at solutions for these common challenges using ONNX and related tooling. ONNX (Open Neural Network eXchange), an open-source graduate project under the Linux Foundation LF AI, defines a standard format for machine learning models that enables AI developers to use their frameworks and tools of choice to train, infer and deploy on a variety of hardware targets.
Integrating Scikit-learn Machine Learning models into the Microsoft .NET
While being part of a team working on designing and developing a lead scoring system prototype, I faced the challenge of integrating machine learning models into the target environment built around the Microsoft .NET ecosystem. Technically, I implemented the lead scoring predictive model using the Scikit-learn machine learning built-in algorithm for regression, more precisely Logistic Regression. Considering the phases of initial data analysis, data preprocessing, exploratory data analysis (EDA), and the data preparation for the model building itself, I used the Jupyter Notebook environment powered by Anaconda distribution for Python scientific computing. Previously, I have investigated and touched Python within Flask as a micro web framework written in this programming language. However, I aimed to integrate or deploy the machine learning model written in Python into the .NET ecosystem, using the C# programming language and Visual Studio IDE.
r/MachineLearning - [P] Run inference with zero dependencies C code and ONNX
In short, onnx provides an Open Neural Network Exchange format. This format, describes a huge set of operators, that can be mixed to create every type of machine learning model that you ever heard of, from a simple neural network to complex deep convolutional networks. Some examples of operators are: matrix multiplications, convolutions, adding, maxpool, sin, cosine, you name it! They provide a standardised set of operators here. So we can say that onnx provides a layer of abstraction to ML models, which makes all framework compatible between them.
Benanza: Automatic $\mu$Benchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs
Li, Cheng, Dakkak, Abdul, Xiong, Jinjun, Hwu, Wen-mei
As Deep Learning (DL) models have been increasingly used in latency-sensitive applications, there has been a growing interest in improving their response time. An important venue for such improvement is to profile the execution of these models and characterize their performance to identify possible optimization opportunities. However, the current profiling tools lack the highly desired abilities to characterize ideal performance, identify sources of inefficiency, and quantify the benefits of potential optimizations. Such deficiencies have led to slow characterization/optimization cycles that cannot keep up with the fast pace at which new DL models are introduced. We propose Benanza, a sustainable and extensible benchmarking and analysis design that speeds up the characterization/optimization cycle of DL models on GPUs. Benanza consists of four major components: a model processor that parses models into an internal representation, a configurable benchmark generator that automatically generates micro-benchmarks given a set of models, a database of benchmark results, and an analyzer that computes the "lower-bound" latency of DL models using the benchmark data and informs optimizations of model execution. The "lower-bound" latency metric estimates the ideal model execution on a GPU system and serves as the basis for identifying optimization opportunities in frameworks or system libraries. We used Benanza to evaluate 30 ONNX models in MXNet, ONNX Runtime, and PyTorch on 7 GPUs ranging from Kepler to the latest Turing, and identified optimizations in parallel layer execution, cuDNN convolution algorithm selection, framework inefficiency, layer fusion, and using Tensor Cores.
Announcing ONNX Runtime 1.0 - Open Source Blog
One year after ONNX Runtime's initial preview release, we're excited to announce v1.0 of the high-performance machine learning model inferencing engine. This release marks our commitment to API stability for the cross-platform, multi-language APIs, and introduces a breadth of performance optimizations, broad operator coverage, and pluggable accelerators to take advantage of new and exciting hardware developments. In its first year, ONNX Runtime was shipped to production for more than 60 models at Microsoft, with adoption from a range of consumer and enterprise products, including Office, Bing, Cognitive Services, Windows, Skype, Ads, and others. These models span from speech to image to text (including state of the art models such as BERT) and ONNX Runtime has improved the performance of these models by an average of 2.5x over previous inferencing solutions. In addition to performance gains, the interoperable ONNX model format has also provided increased infrastructure flexibility, allowing teams to use a common runtime to scalably deploy a breadth of models to a range of hardware.
ONNX Runtime: a one-stop shop for machine learning inferencing
Organizations that want to leverage AI at scale must overcome a number of challenges around model training and model inferencing. Today, there are a plethora of tools and frameworks that accelerate model training but inferencing remains a tough nut due to the variety of environments that models need to run in. For example, the same AI model might need be inferenced on cloud GPUs as well as desktop CPUs and even edge devices. Optimizing a single model for so many different environments takes time, let alone hundreds or thousands of models. In this blog post, we'll show you how Microsoft tackled this challenge internally and how you can leverage the latest version of the same technology.
Run ONNX models with Amazon Elastic Inference Amazon Web Services
At re:Invent 2018, AWS announced Amazon Elastic Inference (EI), a new service that lets you attach just the right amount of GPU-powered inference acceleration to any Amazon EC2 instance. This is also available for Amazon SageMaker notebook instances and endpoints, bringing acceleration to built-in algorithms and to deep learning environments. In this blog post, I show how to use the models in the ONNX Model Zoo on GitHub to perform inference by using MXNet with Elastic Inference Accelerator (EIA) as a backend. Amazon Elastic Inference allows you to attach low-cost GPU-powered acceleration to Amazon EC2 and Amazon SageMaker instances to reduce the cost of running deep learning inference by up to 75 percent. Amazon Elastic Inference provides support for Apache MXNet, TensorFlow, and ONNX models.
- Retail > Online (0.40)
- Information Technology > Services (0.40)